skip to main content
10.1145/775047.775112acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Topics in 0--1 data

Authors Info & Claims
Published:23 July 2002Publication History

ABSTRACT

Large 0--1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain methods succeed or fail. We describe simple algorithms for finding topic models from 0--1 data. We give theoretical results showing that the algorithms can discover the epsilon-separable topic models of Papadimitriou et al. We present empirical results showing that the algorithms find natural topics in real-world data sets. We also briefly discuss the connections to matrix approaches, including nonnegative matrix factorization and independent component analysis.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD '93, pages 207--216, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 12, pages 307--328. AAAI Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. V. Cadez, P. Smyth, and H. Mannila. Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction. In KDD 2001, pages 37--46, San Fransisco, CA, Aug. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. A. Carreira-Perpinan and S. Renals. Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12:141--152, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Comon. Independent component analysis --- a new concept? Signal Processing, 36:287--314, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In Knowledge Discovery and Data Mining, pages 23--29, 1998.Google ScholarGoogle Scholar
  8. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380--393, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Gyllenberg, T. Koski, E. Reilink, and M. Verlaan. Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31:542--548, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  11. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR '99, pages 50--57, Berkeley, CA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, 2001.Google ScholarGoogle Scholar
  13. D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, Oct. 1999.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In PODS '98, pages 159--168, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary datasets. In UAI-2000, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Pavlov and P. Smyth. Probabilistic query models for transaction data. In KDD 2001, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5):401--409, May 1969.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Topics in 0--1 data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
        July 2002
        719 pages
        ISBN:158113567X
        DOI:10.1145/775047

        Copyright © 2002 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2002

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader